Module 0: Lecture Notes 3
Tools for Deep Learning
FAQs (Read These First!)
- Use your BU student email to activate Google Colab Pro.
- You may use VS Code, Cursor, or Antigravity-style workflows.
- All instructor demos and support will use VS Code.
- Colab runtimes are ephemeral. Anything not saved to Google Drive will be lost.
- Always verify that your runtime is using a GPU or TPU before running large models.
- Do not store API keys or credentials directly in notebooks or public repositories.
1 Using Jupyter Notebooks
This section describes how to edit and run the code in each section of this book using the Jupyter Notebook. Make sure you have installed Jupyter and downloaded the code as described in :ref:chap_installation. If you want to know more about Jupyter see the excellent tutorial in their documentation.
1.1 Editing and Running the Code Locally
Suppose that the local path of the book’s code is xx/yy/d2l-en/. Use the shell to change the directory to this path (cd xx/yy/d2l-en) and run the command jupyter notebook. If your browser does not do this automatically, open http://localhost:8888 and you will see the interface of Jupyter and all the folders containing the code of the book, as shown in Figure 1.
You can access the notebook files by clicking on the folder displayed on the webpage. They usually have the suffix “.ipynb”. For the sake of brevity, we create a temporary “test.ipynb” file. The content displayed after you click it is shown in Figure 2. This notebook includes a markdown cell and a code cell. The content in the markdown cell includes “This Is a Title” and “This is text.”. The code cell contains two lines of Python code.
Double click on the markdown cell to enter edit mode. Add a new text string “Hello world.” at the end of the cell, as shown in Figure 3.
As demonstrated in Figure 4, click “Cell” “Run Cells” in the menu bar to run the edited cell.
After running, the markdown cell is shown in Figure 5.
Next, click on the code cell. Multiply the elements by 2 after the last line of code, as shown in Figure 6.
You can also run the cell with a shortcut (“Ctrl + Enter” by default) and obtain the output result from Figure 7.
When a notebook contains more cells, we can click “Kernel” “Restart & Run All” in the menu bar to run all the cells in the entire notebook. By clicking “Help” “Edit Keyboard Shortcuts” in the menu bar, you can edit the shortcuts according to your preferences.
1.2 Advanced Options
Beyond local editing two things are quite important: editing the notebooks in the markdown format and running Jupyter remotely. The latter matters when we want to run the code on a faster server. The former matters since Jupyter’s native ipynb format stores a lot of auxiliary data that is irrelevant to the content, mostly related to how and where the code is run. This is confusing for Git, making reviewing contributions very difficult. Fortunately there is an alternative—native editing in the markdown format.
1.2.1 Markdown Files in Jupyter
If you wish to contribute to the content of this book, you need to modify the source file (md file, not ipynb file) on GitHub. Using the notedown plugin we can modify notebooks in the md format directly in Jupyter.
First, install the notedown plugin, run the Jupyter Notebook, and load the plugin:
pip install notedown # You may need to uninstall the original notedown.
jupyter notebook --NotebookApp.contents_manager_class='notedown.NotedownContentsManager'
You may also turn on the notedown plugin by default whenever you run the Jupyter Notebook. First, generate a Jupyter Notebook configuration file (if it has already been generated, you can skip this step).
jupyter notebook --generate-config
Then, add the following line to the end of the Jupyter Notebook configuration file (for Linux or macOS, usually in the path ~/.jupyter/jupyter_notebook_config.py):
c.NotebookApp.contents_manager_class = 'notedown.NotedownContentsManager'
After that, you only need to run the jupyter notebook command to turn on the notedown plugin by default.
1.2.2 Running Jupyter Notebooks on a Remote Server
Sometimes, you may want to run Jupyter notebooks on a remote server and access it through a browser on your local computer. If Linux or macOS is installed on your local machine (Windows can also support this function through third-party software such as PuTTY), you can use port forwarding:
ssh myserver -L 8888:localhost:8888
The above string myserver is the address of the remote server. Then we can use http://localhost:8888 to access the remote server myserver that runs Jupyter notebooks. We will detail on how to run Jupyter notebooks on AWS instances later in this appendix.
1.2.3 Timing
We can use the ExecuteTime plugin to time the execution of each code cell in Jupyter notebooks. Use the following commands to install the plugin:
pip install jupyter_contrib_nbextensions
jupyter contrib nbextension install --user
jupyter nbextension enable execute_time/ExecuteTime
1.3 Summary
- Using the Jupyter Notebook tool, we can edit, run, and contribute to each section of the book.
- We can run Jupyter notebooks on remote servers using port forwarding.
2 Using Google Colab
:label:sec_colab
We introduced how to run this book on AWS in :numref:sec_sagemaker and :numref:sec_aws. Another option is running this book on Google Colab if you have a Google account.
To run the code of a section on Colab, simply click the Colab button as shown in :numref:fig_colab.
:width:
300px :label:fig_colab
If it is your first time to run a code cell, you will receive a warning message as shown in :numref:fig_colab2. Just click “RUN ANYWAY” to ignore it.
:width:
300px :label:fig_colab2
Next, Colab will connect you to an instance to run the code of this section. Specifically, if a GPU is needed, Colab will be automatically requested for connecting to a GPU instance.
2.1 Summary
- You can use Google Colab to run each section’s code in this book.
- Colab will be requested to connect to a GPU instance if a GPU is needed in any section of this book.
2.2 Exercises
- Open any section of this book using Google Colab.
- Edit and run any section that requires a GPU using Google Colab.
3 Using Amazon SageMaker
:label:sec_sagemaker
Deep learning applications may demand so much computational resource that easily goes beyond what your local machine can offer. Cloud computing services allow you to run GPU-intensive code of this book more easily using more powerful computers. This section will introduce how to use Amazon SageMaker to run the code of this book.
3.1 Signing Up
First, we need to sign up an account at https://aws.amazon.com/. For additional security, using two-factor authentication is encouraged. It is also a good idea to set up detailed billing and spending alerts to avoid any surprise, e.g., when forgetting to stop running instances. After logging into your AWS account, go to your console and search for “Amazon SageMaker” (see :numref:fig_sagemaker), then click it to open the SageMaker panel.
:width:
300px :label:fig_sagemaker
3.2 Creating a SageMaker Instance
Next, let’s create a notebook instance as described in :numref:fig_sagemaker-create.
:width:
400px :label:fig_sagemaker-create
SageMaker provides multiple instance types with varying computational power and prices. When creating a notebook instance, we can specify its name and type. In :numref:fig_sagemaker-create-2, we choose ml.p3.2xlarge: with one Tesla V100 GPU and an 8-core CPU, this instance is powerful enough for most of the book.
:width:
400px :label:fig_sagemaker-create-2
:begin_tab:mxnet The entire book in the ipynb format for running with SageMaker is available at https://github.com/d2l-ai/d2l-en-sagemaker. We can specify this GitHub repository URL (:numref:fig_sagemaker-create-3) to allow SageMaker to clone it when creating the instance. :end_tab:
:begin_tab:pytorch The entire book in the ipynb format for running with SageMaker is available at https://github.com/d2l-ai/d2l-pytorch-sagemaker. We can specify this GitHub repository URL (:numref:fig_sagemaker-create-3) to allow SageMaker to clone it when creating the instance. :end_tab:
:begin_tab:tensorflow The entire book in the ipynb format for running with SageMaker is available at https://github.com/d2l-ai/d2l-tensorflow-sagemaker. We can specify this GitHub repository URL (:numref:fig_sagemaker-create-3) to allow SageMaker to clone it when creating the instance. :end_tab:
:width:
400px :label:fig_sagemaker-create-3
3.3 Running and Stopping an Instance
Creating an instance may take a few minutes. When it is ready, click on the “Open Jupyter” link next to it (:numref:fig_sagemaker-open) so you can edit and run all the Jupyter notebooks of this book on this instance (similar to steps in :numref:sec_jupyter).
:width:
400px :label:fig_sagemaker-open
After finishing your work, do not forget to stop the instance to avoid being charged further (:numref:fig_sagemaker-stop).
:width:
300px :label:fig_sagemaker-stop
3.4 Updating Notebooks
:begin_tab:mxnet Notebooks of this open-source book will be regularly updated in the d2l-ai/d2l-en-sagemaker repository on GitHub. To update to the latest version, you may open a terminal on the SageMaker instance (:numref:fig_sagemaker-terminal). :end_tab:
:begin_tab:pytorch Notebooks of this open-source book will be regularly updated in the d2l-ai/d2l-pytorch-sagemaker repository on GitHub. To update to the latest version, you may open a terminal on the SageMaker instance (:numref:fig_sagemaker-terminal). :end_tab:
:begin_tab:tensorflow Notebooks of this open-source book will be regularly updated in the d2l-ai/d2l-tensorflow-sagemaker repository on GitHub. To update to the latest version, you may open a terminal on the SageMaker instance (:numref:fig_sagemaker-terminal). :end_tab:
:width:
300px :label:fig_sagemaker-terminal
You may wish to commit your local changes before pulling updates from the remote repository. Otherwise, simply discard all your local changes with the following commands in the terminal:
:begin_tab:mxnet
cd SageMaker/d2l-en-sagemaker/
git reset --hard
git pull:end_tab:
:begin_tab:pytorch
cd SageMaker/d2l-pytorch-sagemaker/
git reset --hard
git pull:end_tab:
:begin_tab:tensorflow
cd SageMaker/d2l-tensorflow-sagemaker/
git reset --hard
git pull:end_tab:
3.5 Summary
- We can create a notebook instance using Amazon SageMaker to run GPU-intensive code of this book.
- We can update notebooks via the terminal on the Amazon SageMaker instance.
3.6 Exercises
- Edit and run any section that requires a GPU using Amazon SageMaker.
- Open a terminal to access the local directory that hosts all the notebooks of this book.
4 Using AWS EC2 Instances
:label:sec_aws
In this section, we will show you how to install all libraries on a raw Linux machine. Recall that in :numref:sec_sagemaker we discussed how to use Amazon SageMaker, while building an instance by yourself costs less on AWS. The walkthrough includes three steps:
- Request for a GPU Linux instance from AWS EC2.
- Install CUDA (or use an Amazon Machine Image with preinstalled CUDA).
- Install the deep learning framework and other libraries for running the code of the book.
This process applies to other instances (and other clouds), too, albeit with some minor modifications. Before going forward, you need to create an AWS account, see :numref:sec_sagemaker for more details.
4.1 Creating and Running an EC2 Instance
After logging into your AWS account, click “EC2” (:numref:fig_aws) to go to the EC2 panel.
:width:
400px :label:fig_aws
:numref:fig_ec2 shows the EC2 panel.
:width:
700px :label:fig_ec2
4.1.1 Presetting Location
Select a nearby data center to reduce latency, e.g., “Oregon” (marked by the red box in the top-right of :numref:fig_ec2). If you are located in China, you can select a nearby Asia Pacific region, such as Seoul or Tokyo. Please note that some data centers may not have GPU instances.
4.1.2 Increasing Limits
Before choosing an instance, check if there are quantity restrictions by clicking the “Limits” label in the bar on the left as shown in :numref:fig_ec2. :numref:fig_limits shows an example of such a limitation. The account currently cannot open “p2.xlarge” instances according to the region. If you need to open one or more instances, click on the “Request limit increase” link to apply for a higher instance quota. Generally, it takes one business day to process an application.
:width:
700px :label:fig_limits
4.1.3 Launching an Instance
Next, click the “Launch Instance” button marked by the red box in :numref:fig_ec2 to launch your instance.
We begin by selecting a suitable Amazon Machine Image (AMI). Select an Ubuntu instance (:numref:fig_ubuntu).
:width:
700px :label:fig_ubuntu
EC2 provides many different instance configurations to choose from. This can sometimes feel overwhelming to a beginner. :numref:tab_ec2 lists different suitable machines.
| Name | GPU | Notes |
|---|---|---|
| g2 | Grid K520 | ancient |
| p2 | Kepler K80 | old but often cheap as spot |
| g3 | Maxwell M60 | good trade-off |
| p3 | Volta V100 | high performance for FP16 |
| p4 | Ampere A100 | high performance for large-scale training |
| g4 | Turing T4 | inference optimized FP16/INT8 |
All these servers come in multiple flavors indicating the number of GPUs used. For example, a p2.xlarge has 1 GPU and a p2.16xlarge has 16 GPUs and more memory. For more details, see the AWS EC2 documentation or a summary page. For the purpose of illustration, a p2.xlarge will suffice (marked in the red box of :numref:fig_p2x).
:width:
700px :label:fig_p2x
Note that you should use a GPU-enabled instance with suitable drivers and a GPU-enabled deep learning framework. Otherwise you will not see any benefit from using GPUs.
We go on to select the key pair used to access the instance. If you do not have a key pair, click “Create new key pair” in :numref:fig_keypair to generate a key pair. Subsequently, you can select the previously generated key pair. Make sure that you download the key pair and store it in a safe location if you generated a new one. This is your only way to SSH into the server.
:width:
500px :label:fig_keypair
In this example, we will keep the default configurations for “Network settings” (click the “Edit” button to configure items such as the subnet and security groups). We just increase the default hard disk size to 64 GB (:numref:fig_disk). Note that CUDA by itself already takes up 4 GB.
:width:
700px :label:fig_disk
Click “Launch Instance” to launch the created instance. Click the instance ID shown in :numref:fig_launching to view the status of this instance.
:width:
700px :label:fig_launching
4.1.4 Connecting to the Instance
As shown in :numref:fig_connect, after the instance state turns green, right-click the instance and select Connect to view the instance access method.
:width:
700px :label:fig_connect
If this is a new key, it must not be publicly viewable for SSH to work. Go to the folder where you store D2L_key.pem and execute the following command to make the key not publicly viewable:
chmod 400 D2L_key.pem :width:
400px :label:fig_chmod
Now, copy the SSH command in the lower red box of :numref:fig_chmod and paste onto the command line:
ssh -i "D2L_key.pem" ubuntu@ec2-xx-xxx-xxx-xxx.y.compute.amazonaws.comWhen the command line prompts “Are you sure you want to continue connecting (yes/no)”, enter “yes” and press Enter to log into the instance.
Your server is ready now.
4.2 Installing CUDA
Before installing CUDA, be sure to update the instance with the latest drivers.
sudo apt-get update && sudo apt-get install -y build-essential git libgfortran3Here we download CUDA 12.1. Visit NVIDIA’s official repository to find the download link as shown in :numref:fig_cuda.
:width:
500px :label:fig_cuda
Copy the instructions and paste them onto the terminal to install CUDA 12.1.
# The link and file name are subject to changes
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda-repo-ubuntu2204-12-1-local_12.1.0-530.30.02-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-12-1-local_12.1.0-530.30.02-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-12-1-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cudaAfter installing the program, run the following command to view the GPUs:
nvidia-smiFinally, add CUDA to the library path to help other libraries find it, such as appending the following lines to the end of ~/.bashrc.
export PATH="/usr/local/cuda-12.1/bin:$PATH"
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/cuda-12.1/lib644.3 Installing Libraries for Running the Code
To run the code of this book, just follow steps in :ref:chap_installation for Linux users on the EC2 instance and use the following tips for working on a remote Linux server:
- To download the bash script on the Miniconda installation page, right click the download link and select “Copy Link Address”, then execute
wget [copied link address]. - After running
~/miniconda3/bin/conda init, you may executesource ~/.bashrcinstead of closing and reopening your current shell.
4.4 Running the Jupyter Notebook remotely
To run the Jupyter Notebook remotely you need to use SSH port forwarding. After all, the server in the cloud does not have a monitor or keyboard. For this, log into your server from your desktop (or laptop) as follows:
# This command must be run in the local command line
ssh -i "/path/to/key.pem" ubuntu@ec2-xx-xxx-xxx-xxx.y.compute.amazonaws.com -L 8889:localhost:8888
Next, go to the location of the downloaded code of this book on the EC2 instance, then run:
conda activate d2l
jupyter notebook
:numref:fig_jupyter shows the possible output after you run the Jupyter Notebook. The last row is the URL for port 8888.
:width:
700px :label:fig_jupyter
Since you used port forwarding to port 8889, copy the last row in the red box of :numref:fig_jupyter, replace “8888” with “8889” in the URL, and open it in your local browser.
4.5 Closing Unused Instances
As cloud services are billed by the time of use, you should close instances that are not being used. Note that there are alternatives:
- “Stopping” an instance means that you will be able to start it again. This is akin to switching off the power for your regular server. However, stopped instances will still be billed a small amount for the hard disk space retained.
- “Terminating” an instance will delete all data associated with it. This includes the disk, hence you cannot start it again. Only do this if you know that you will not need it in the future.
If you want to use the instance as a template for many more instances, right-click on the example in :numref:fig_connect and select “Image” “Create” to create an image of the instance. Once this is complete, select “Instance State” “Terminate” to terminate the instance. The next time you want to use this instance, you can follow the steps in this section to create an instance based on the saved image. The only difference is that, in “1. Choose AMI” shown in :numref:fig_ubuntu, you must use the “My AMIs” option on the left to select your saved image. The created instance will retain the information stored on the image hard disk. For example, you will not have to reinstall CUDA and other runtime environments.
4.6 Summary
- We can launch and stop instances on demand without having to buy and build our own computer.
- We need to install CUDA before using the GPU-enabled deep learning framework.
- We can use port forwarding to run the Jupyter Notebook on a remote server.
4.7 Exercises
- The cloud offers convenience, but it does not come cheap. Find out how to launch spot instances to see how to reduce costs.
- Experiment with different GPU servers. How fast are they?
- Experiment with multi-GPU servers. How well can you scale things up?
5 Selecting Servers and GPUs
:label:sec_buy_gpu
Deep learning training generally requires large amounts of computation. At present GPUs are the most cost-effective hardware accelerators for deep learning. In particular, compared with CPUs, GPUs are cheaper and offer higher performance, often by over an order of magnitude. Furthermore, a single server can support multiple GPUs, up to 8 for high end servers. More typical numbers are up to 4 GPUs for an engineering workstation, since heat, cooling, and power requirements escalate quickly beyond what an office building can support. For larger deployments, cloud computing (e.g., Amazon’s P3 and G4 instances) is a much more practical solution.
5.1 Selecting Servers
There is typically no need to purchase high-end CPUs with many threads since much of the computation occurs on the GPUs. That said, due to the global interpreter lock (GIL) in Python single-thread performance of a CPU can matter in situations where we have 4–8 GPUs. All things equal this suggests that CPUs with a smaller number of cores but a higher clock frequency might be a more economical choice. For example, when choosing between a 6-core 4 GHz and an 8-core 3.5 GHz CPU, the former is much preferable, even though its aggregate speed is less. An important consideration is that GPUs use lots of power and thus dissipate lots of heat. This requires very good cooling and a large enough chassis to use the GPUs. Follow the guidelines below if possible:
- Power Supply. GPUs use significant amounts of power. Budget with up to 350W per device (check for the peak demand of the graphics card rather than typical demand, since efficient code can use lots of energy). If your power supply is not up to the demand you will find that your system becomes unstable.
- Chassis Size. GPUs are large and the auxiliary power connectors often need extra space. Also, large chassis are easier to cool.
- GPU Cooling. If you have a large number of GPUs you might want to invest in water cooling. Also, aim for reference designs even if they have fewer fans, since they are thin enough to allow for air intake between the devices. If you buy a multi-fan GPU it might be too thick to get enough air when installing multiple GPUs and you will run into thermal throttling.
- PCIe Slots. Moving data to and from the GPU (and exchanging it between GPUs) requires lots of bandwidth. We recommend PCIe 3.0 slots with 16 lanes. If you mount multiple GPUs, be sure to carefully read the motherboard description to ensure that 16 bandwidth is still available when multiple GPUs are used at the same time and that you are getting PCIe 3.0 as opposed to PCIe 2.0 for the additional slots. Some motherboards downgrade to 8 or even 4 bandwidth with multiple GPUs installed. This is partly due to the number of PCIe lanes that the CPU offers.
In short, here are some recommendations for building a deep learning server:
- Beginner. Buy a low end GPU with low power consumption (cheap gaming GPUs suitable for deep learning use 150–200W). If you are lucky your current computer supports it.
- 1 GPU. A low-end CPU with 4 cores will be sufficient and most motherboards suffice. Aim for at least 32 GB DRAM and invest into an SSD for local data access. A power supply with 600W should be sufficient. Buy a GPU with lots of fans.
- 2 GPUs. A low-end CPU with 4-6 cores will suffice. Aim for 64 GB DRAM and invest into an SSD. You will need in the order of 1000W for two high-end GPUs. In terms of mainboards, make sure that they have two PCIe 3.0 x16 slots. If you can, get a mainboard that has two free spaces (60mm spacing) between the PCIe 3.0 x16 slots for extra air. In this case, buy two GPUs with lots of fans.
- 4 GPUs. Make sure that you buy a CPU with relatively fast single-thread speed (i.e., high clock frequency). You will probably need a CPU with a larger number of PCIe lanes, such as an AMD Threadripper. You will likely need relatively expensive mainboards to get 4 PCIe 3.0 x16 slots since they probably need a PLX to multiplex the PCIe lanes. Buy GPUs with reference design that are narrow and let air in between the GPUs. You need a 1600–2000W power supply and the outlet in your office might not support that. This server will probably run loud and hot. You do not want it under your desk. 128 GB of DRAM is recommended. Get an SSD (1–2 TB NVMe) for local storage and a bunch of hard disks in RAID configuration to store your data.
- 8 GPUs. You need to buy a dedicated multi-GPU server chassis with multiple redundant power supplies (e.g., 2+1 for 1600W per power supply). This will require dual socket server CPUs, 256 GB ECC DRAM, a fast network card (10 GBE recommended), and you will need to check whether the servers support the physical form factor of the GPUs. Airflow and wiring placement differ significantly between consumer and server GPUs (e.g., RTX 2080 vs. Tesla V100). This means that you might not be able to install the consumer GPU in a server due to insufficient clearance for the power cable or lack of a suitable wiring harness (as one of the coauthors painfully discovered).
5.2 Selecting GPUs
At present, AMD and NVIDIA are the two main manufacturers of dedicated GPUs. NVIDIA was the first to enter the deep learning field and provides better support for deep learning frameworks via CUDA. Therefore, most buyers choose NVIDIA GPUs.
NVIDIA provides two types of GPUs, targeting individual users (e.g., via the GTX and RTX series) and enterprise users (via its Tesla series). The two types of GPUs provide comparable compute power. However, the enterprise user GPUs generally use (passive) forced cooling, more memory, and ECC (error correcting) memory. These GPUs are more suitable for data centers and usually cost ten times more than consumer GPUs.
If you are a large company with 100+ servers you should consider the NVIDIA Tesla series or alternatively use GPU servers in the cloud. For a lab or a small to medium company with 10+ servers the NVIDIA RTX series is likely most cost effective. You can buy preconfigured servers with Supermicro or Asus chassis that hold 4–8 GPUs efficiently.
GPU vendors typically release a new generation every one to two years, such as the GTX 1000 (Pascal) series released in 2017 and the RTX 2000 (Turing) series released in 2019. Each series offers several different models that provide different performance levels. GPU performance is primarily a combination of the following three parameters:
- Compute Power. Generally we look for 32-bit floating-point compute power. 16-bit floating point training (FP16) is also entering the mainstream. If you are only interested in prediction, you can also use 8-bit integer. The latest generation of Turing GPUs offers 4-bit acceleration. Unfortunately at the time of writing the algorithms for training low-precision networks are not yet widespread.
- Memory Size. As your models become larger or the batches used during training grow bigger, you will need more GPU memory. Check for HBM2 (High Bandwidth Memory) vs. GDDR6 (Graphics DDR) memory. HBM2 is faster but much more expensive.
- Memory Bandwidth. You can only get the most out of your compute power when you have sufficient memory bandwidth. Look for wide memory buses if using GDDR6.
For most users, it is enough to look at compute power. Note that many GPUs offer different types of acceleration. For example, NVIDIA’s TensorCores accelerate a subset of operators by 5. Ensure that your libraries support this. The GPU memory should be no less than 4 GB (8 GB is much better). Try to avoid using the GPU also for displaying a GUI (use the built-in graphics instead). If you cannot avoid it, add an extra 2 GB of RAM for safety.
:numref:fig_flopsvsprice compares the 32-bit floating-point compute power and price of the various GTX 900, GTX 1000 and RTX 2000 series models. The prices suggested are those found on Wikipedia at the time of writing.
:label:
fig_flopsvsprice
We can see a number of things:
- Within each series, price and performance are roughly proportional. Titan models command a significant premium for the benefit of larger amounts of GPU memory. However, the newer models offer better cost effectiveness, as can be seen by comparing the 980 Ti and 1080 Ti. The price does not appear to improve much for the RTX 2000 series. However, this is due to the fact that they offer far superior low precision performance (FP16, INT8, and INT4).
- The performance-to-cost ratio of the GTX 1000 series is about two times greater than the 900 series.
- For the RTX 2000 series the performance (in GFLOPs) is an affine function of the price.
:label:
fig_wattvsprice
:numref:fig_wattvsprice shows how energy consumption scales mostly linearly with the amount of computation. Second, later generations are more efficient. This seems to be contradicted by the graph corresponding to the RTX 2000 series. However, this is a consequence of the TensorCores that draw disproportionately much energy.
5.3 Summary
- Watch out for power, PCIe bus lanes, CPU single thread speed, and cooling when building a server.
- You should purchase the latest GPU generation if possible.
- Use the cloud for large deployments.
- High density servers may not be compatible with all GPUs. Check the mechanical and cooling specifications before you buy.
- Use FP16 or lower precision for high efficiency.